Fitting Sentence Level Translation Evaluation with Many Dense Features
نویسندگان
چکیده
Sentence level evaluation in MT has turned out far more difficult than corpus level evaluation. Existing sentence level metrics employ a limited set of features, most of which are rather sparse at the sentence level, and their intricate models are rarely trained for ranking. This paper presents a simple linear model exploiting 33 relatively dense features, some of which are novel while others are known but seldom used, and train it under the learning-to-rank framework. We evaluate our metric on the standard WMT12 data showing that it outperforms the strong baseline METEOR. We also analyze the contribution of individual features and the choice of training data, language-pair vs. target-language data, providing new insights into this task.
منابع مشابه
Towards a Literary Machine Translation: The Role of Referential Cohesion
What is the role of textual features above the sentence level in advancing the machine translation of literature? This paper examines how referential cohesion is expressed in literary and non-literary texts and how this cohesion affects translation. We first show in a corpus study on English that literary texts use more dense reference chains to express greater referential cohesion than news. W...
متن کاملRegression and Ranking based Optimisation for Sentence Level Machine Translation Evaluation
Automatic evaluation metrics are fundamentally important for Machine Translation, allowing comparison of systems performance and efficient training. Current evaluation metrics fall into two classes: heuristic approaches, like BLEU, and those using supervised learning trained on human judgement data. While many trained metrics provide a better match against human judgements, this comes at the co...
متن کاملRegression and Ranking based Optimisation for Sentence Level MT Evaluation
Automatic evaluation metrics are fundamentally important for Machine Translation, allowing comparison of systems performance and efficient training. Current evaluation metrics fall into two classes: heuristic approaches, like BLEU, and those using supervised learning trained on human judgement data. While many trained metrics provide a better match against human judgements, this comes at the co...
متن کاملUsing Machine Translation Evaluation Techniques to Determine Sentence-level Semantic Equivalence
The task of machine translation (MT) evaluation is closely related to the task of sentence-level semantic equivalence classification. This paper investigates the utility of applying standard MT evaluation methods (BLEU, NIST, WER and PER) to building classifiers to predict semantic equivalence and entailment. We also introduce a novel classification method based on PER which leverages part of s...
متن کاملRegression for Sentence-Level MT Evaluation with Pseudo References
Many automatic evaluation metrics for machine translation (MT) rely on making comparisons to human translations, a resource that may not always be available. We present a method for developing sentence-level MT evaluation metrics that do not directly rely on human reference translations. Our metrics are developed using regression learning and are based on a set of weaker indicators of fluency a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014